Intro

Data Exploration

Datasets

Data Exploration

Missing value

Distribution

Categorical features

Try other visualization method:

Numerical features

Outliner

Relationships

Feature Engineering

Missing

Outliers

Check distributions after cleaning

Datetime

Geo

Binning

One-hot encoding

Modelling

Base Model

RandomForestRegressor

AdaBoostRegressor

GradientBoostingRegressor

Random Forest
Objective metric on test: 0.5317
Parameters: 'n_estimators': 80, 'min_samples_split': 3, 'max_depth': 10

AdaBoost
Objective metric on test: 0.6182
Parameters: {'n_estimators': 60, 'learning_rate': 0.1}

Gradient Boosting
Objective metric on test: 0.5322
Parameters: {'n_estimators': 150, 'min_samples_split': 8, 'loss': 'ls', 'learning_rate': 0.1}

Important Features

Classifier RandomForest AdaBoost GradientBoosting
Feature 1 payment_type_group_CRD payment_type_group_CRD payment_type_group_CRD
Feature 2 fare_amount fare_amount fare_amount
Feature 3 payment_type_group_CRE trip_distance payment_type_group_CRE
Feature 4 min_period payment_type_group_CRE min_period
Feature 5 vendor_id_VTS payment_type_group_CSH payment_type_group_Other